Jaccard similarity

Terms from Artificial Intelligence: humans at the heart of algorithms

The glossary is being gradually proof checked, but currently has many typos and misspellings.

Jaccard similarity is a measure of similarity between two documents. Given two documents, doc1 and doc2, the Jaccard similarity uses on the bag of words in each (say words1 and words2), and then calculates
      | words1 ∩ words2 | / | words1 ∪ words2 |
That is the number of distinct words in both documents divided by the number of distinct words in the union. The Jaccard similarity is used heavily in document retrieval algorithms.

Used in Chap. 10: page 141; Chap. 18: page 287